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Abstract. Algorithmic sclf-asscmbly, a generalization of crystal growth processes, has been pro- 
posed as a mechanism for autonomous DNA computation and for bottom-up fabrication of complex 
nanostructures. A 'program' for growing a desired structure consists of a set of molecular 'tiles' 
designed to have specific binding interactions. A key challenge to making algorithmic self-assembly 
practical is designing tile set programs that make assembly robust to errors that occur during initi- 
ation and growth. One method for the controlled initiation of assembly, often seen in biology, is the 
use of a seed or catalyst molecule that reduces an otherwise large kinetic barrier to nucleation. Here 
we show how to program algorithmic self-assembly similarly, such that seeded assembly proceeds 
quickly but there is an arbitrarily large kinetic barrier to unseeded growth. We demonstrate this 
technique by introducing a family of tile sets for which we rigorously prove that, under the right 
physical conditions, linearly increasing the size of the tile set exponentially reduces the rate of spu- 
rious nucleation. Simulations of these 'zig-zag' tile sets suggest that under plausible experimental 
conditions, it is possible to grow large seeded crystals in just a few hours such that less than 1 per- 
cent of crystals are spuriously nucleated. Simulation results also suggest that zig-zag tile sets could 
be used for detection of single DNA strands. Together with prior work showing that tile sets can 
be made robust to errors during properly initiated growth, this work demonstrates that growth of 
objects via algorithmic self-assembly can proceed both efficiently and with an arbitrarily low error 
rate, even in a model where local growth rules arc probabilistic. 
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1. Introduction. Molecular self-assembly is an emerging low-cost alternative to 
lithography for the creation of materials and devices with sub-nanometer precision [491 
125] . Whereas top-down methods such as photolithography impose order externally 
(e.g., a mask with a blueprint of the desired structure) bottom- up fabrication by self- 
assembly requires that this information be embedded within the chemical processes 
themselves. 

Biology demonstrates that self-assembly can be used to create complex objects. 
Organisms produce sophisticated and functional organization from the nanometer 
scale to the meter scale and beyond. Structures such as virus capsids, bacterial 
flagella, actin networks and microtubules can assemble from their purified components, 
even without external direction from enzymes or metabolism. This suggests that 
spontaneous molecular self-assembly can be engineered to create an interesting class 
of complex supramolecular structures. A central challenge is how to create a large 
structure without having to design a large number of unique molecular components. 

Algorithmic self-assembly has been proposed as a general method for engineering 
such structures |50j by making use of local binding affinities to direct the placement 
of molecules during growth. The binding of a particular molecule at a particular 
site is viewed as a computational or information transfer step. By designing only a 
modest number of molecular species, which constitute the instructions or program for 
how to grow an object, complex objects can be constructed in principle |38[ I47[ I10j . 
Because a self-assembly reaction occurring in a well-mixed vessel is inherently parallel, 
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Fig. 1.1: Assembly of DNA strands into DNA tiles and DNA crystal lattices. 

The configurations arc depicted using the NAMOT modeling program (48] . Stages of 
an assembly reaction during an anneal are separated by successive arrows. Strands 
with different sequences are shown in different colors. At high temperatures (first 
stage) strands are free. As the temperature is lowered, strands assemble into tiles 
(second stage). Each tile displays four sticky ends. Example sequences are shown for 
a pair of complementary sticky ends, one on each tile. As the temperature is lowered 
further, tiles successively join to form lattices (third through sixth stages). 



it is necessary to ensure that the molecules that encode the instructions for assembly 
execute these reactions in the correct order. The primary concern of this paper is 
how to design a set of molecules that correctly initiate the execution of a self-assembly 
program. We address this question theoretically, using a model that is commonly used 
to study crystallization [26] , but which incorporates the particularities of algorithmic 
self-assembly. 

To motivate the model we use, we first describe a specific molecular system 
that can implement algorithmic self-assembly experimentally. DNA double crossover 
molecules [T5] and related complexes [121 [551 HI] (henceforth, 'DNA tiles') have the 
necessary regular structure and programmable affinity to implement algorithmic self- 
assembly, and simple periodic [53] [23l [42] and algorithmic J28[ [37] [3] [4] self-assembly 
reactions have been realized experimentally. As an example, consider one of the DNA 
double crossover molecules shown in Figurc[0] which self-assembles from 4 strands of 
synthetic DNA. The sequences have been designed such that the desired pseudoknot- 
ted configuration maximizes the Watson-Crick complementarity. Since the energy 
landscape for folding is dominated by logical complementarity more so than by spe- 
cific sequence details, it is possible to design similar double crossover molecules with 
completely dissimilar sequences. To date, nearly 100 different molecules of this type 
have been synthesized. 

Interactions between DNA tiles are dictated by the base sequences of each of four 
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singlc-strandcd overhangs, termed 'sticky ends,' which can be chosen as desired for 
each tile type. Tiles assemble through the hybridization of complementary sticky ends. 
The free energy of association for two tiles in a particular orientation is assumed to 
be dominated by the energy of hybridization between their adjacent sticky ends. The 
hybridization energy is favorable when complementary sticky ends bind, but negligible 
or unfavorable for non-complcmcntary sticky ends. The DNA tiles shown assemble 
(Figurc Tl-ip via the binding of sticky ends to four adjacent molecules; repeated binding 
between DNA tiles and assemblies can produce a lattice. When multiple tile types 
are present in solution, each site on the growth front of the crystal preferentially will 
select from solution a tile that makes the most favorable bonds. Under appropriate 
physical conditions, a tile that can attach by two sticky ends will be secured in place, 
while tiles that attach by only a single sticky end usually will be rejected due to a 
fast dissociation reaction. We call these "favorable" and "unfavorable" attachments, 
respectively. 

The design of an algorithmic self-assembly reaction begins with the creation of a 
tile program and its evaluation in an idealized model of tile interaction, the abstract 
tile assembly model (aTAM) [51]. A DNA tile is represented as a square tile with 
labels on each side representing the four sticky ends. Polyomino tiles with labels on 
each unit-length of the perimeter can be used in addition to square tiles, since it is 
possible to generate the corresponding DNA structures. A tile program consists of 
a set of such tiles, the strength with which each possible pair of labels binds, a des- 
ignated seed tile, and a strength threshold r. Under the aTAM, growth starts with 
a designated assembly of tiles (usually just the seed tile) and proceeds by allowing 
favorable attachments of tiles to occur. That is, tiles may be added where the total 
strength of the connections between the tile and the assembly is greater than or equal 
to the threshold r. Addition of tiles is irreversible. At a given step, any allowed 
attachment may be performed. An example of a structure that can be constructed 
using algorithmic self-assembly, a Sierpinski triangle, is shown in Figure 11.2b . Be- 
ginning with the seed tile, assembly in the aTAM will result in the growth of an 
V-shaped boundary that is subsequently (and simultaneously) hllcd in by "rule tiles" 
that obtain their inputs from their bottom sides and present their outputs on their 
top sides. The four rule tiles for this self-assembly program have inputs and outputs 
corresponding to the four cases in the look-up table for XOR. The assembly of these 
tiles therefore executes the standard iterative procedure for building Pascal's triangle 
mod 2. While the Sierpinski triangle construction is particularly simple, algorithmic 
construction is widely applicable: Tile sets for the construction of a variety of desired 
products have been described [50l [24J |35j Q] [TUJ [2], including a tile set capable of 
universal construction |47j . 

The aTAM captures the essential algorithmic mechanisms of generalized crystal 
growth and makes it possible to program self-assembly processes in a straightforward 
way. In contrast to assembly in the aTAM however, the assembly of DNA tiles 
is neither errorless nor irreversible, nor is it guaranteed to start from a seed tile. 
For example, in experimental demonstrations of algorithmic self-assembly |371 [3], 
between 1% and 10% of tiles mismatched their neighbors and only a small fraction 
of the observed crystals were properly nucleated from seed molecules. Following [39] , 
Figure [L2b illustrates how unseeded nucleation and unfavorable attachments can lead 
to undesired assemblies. 

To theoretically study the rates at which errors occur, we need a model that in- 
cludes energetically unfavorable events. The kinetic tile assembly model (kTAM) [5T] 
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Fig. 1.2: The Sierpinski tile set. (a) Because DNA tiles are generally not rotation- 
ally symmetric, formal tiles cannot be rotated. The lower diagram shows the seeded 
growth of the Sierpinski tiles according to the aTAM at r = 2. The small tiles indicate 
the (only) four sites where growth can occur. When growth begins from a seed, no 
more than one tile type can attach at each location, so assembly always produces the 
same pattern, (b) Errors can result from improper nucleation when assembly does 
not begin from the seed tile. Tile sets containing a tile that can polymerize due to 
strong bonds are particularly prone to nucleation errors. Improper nucleation can 
produce a long facet where a single insufficient attachment can allow a surrounding 
block of tiles to attach favorably. Different such blocks of tiles may be incompatible, 
leading to an inevitable mismatch at their interface. The straight arrow indicates a 
site where such a mismatch must occur, (c) The rates of tile assembly and disassem- 
bly in the kinetic Tile Assembly Model (kTAM). For the growth of an isolated crystal 
under unchanging tile concentrations, the forward (association) rate in the kTAM 
is rf = kf[tile] = kfe~ Gmc -, while the reverse (dissociation) rate is r r ^ = kfe~ bGac 
for a tile that makes bonds with total strength b. Parameters G mc and G se govern 
monomer tile concentration and sticky-end bond strength respectively. A represen- 
tative selection of possible events is shown here. Attachments with reverse rates 7> i 
are unfavorable for G mc > G se . The kTAM approximates the aTAM with threshold 
t when G mc = rG se — e, for small e. The same set of reactions are favorable or 
unfavorable in the two models. 



describes the dynamics of assembly according to an inclusive set of reversible chemi- 
cal reactions: a tile can attach to an assembly anywhere that it makes even a weak 
bond, and any tile can dissociate from the assembly at a rate dependent on the to- 
tal strength with which it adheres to the assembly (see Figure [TTiZb ) . The kTAM is 
a lattice-based model, in which free tiles are assumed to be well mixed in solution 
and effects within the crystal such as bending or pressure differences are ignored. 
The kTAM has been used to study the trade-off between crystal growth rate and the 
frequency of mismatches (errors) in seeded assemblies [HJ [17] . One result of these 
studies is that, in principle, the rate of mismatch errors can be reduced by assem- 
bling crystals more slowly. Analysis of assembly within the kTAM also suggests that 
it is possible to control assembly errors by reprogramming an existing tile set so as 

1 



to introduce redundancy. "Proofreading tile sets" [SU [7J [331 [3S] transform a tile set 
by replacing each individual tile with a k x k block of tiles, exponentially reducing 
seeded growth errors with respect to the size of the block. These results support 
the notion that the aTAM, despite its simplicity, provides a suitable framework for 
the design of algorithmic crystal growth behavior, ie that any tile program for the 
aTAM can be systematically modified to work with arbitrarily low error rates in the 
more realistic kTAM. However, previous work did not adequately address the issue 
of nucleation errors, which requires extending the kTAM from treating only seeded 
growth to treating all reactions occurring in solution. 

What is needed is a method of transforming a tile set to reduce the rate of 
nucleation errors without significant slow-down. The transformed tile set must satisfy 
two conflicting constraints: when assembly begins from a seed tile, it must proceed 
quickly and correctly, whereas assembly that starts from a non-seed tile must overcome 
a substantial barrier to nucleation in order to continue. 

How is it possible to have a barrier to nucleation only when no seed is present? 
In a mechanism for the control of 1-dimensional polymerization, found both in biol- 
ogy [44] [9] and engineering [13] , a seed induces a conformational or chemical change to 
monomers, without which monomers cannot polymerize. For example, in spontaneous 
actin polymerization, it is proposed that a trimer occasionally bends to form an incipi- 
ent helix that allows for further growth [33] . The Arp 2/3 protein complex imitates the 
shape of an unfavorable intermediate of the spontaneous actin nucleation process |22j . 
In contrast, in two- and three-dimensional systems — condensation of a gas [31j . crys- 
tallization [30], or in the Ising model [45] — classical nucleation theory [56l [TT] predicts 
that a barrier to nucleation exists because clusters have unfavorable energies propor- 
tional to the surface area of the cluster (possibly due to intcrfacial tension or pressure 
differences with respect to the surrounding solution), and favorable energies propor- 
tional to the volume of the cluster. Because volume grows more quickly than surface 
area, as clusters grow larger, a supersaturated regime exists where small clusters tend 
to melt, but above a critical size, cluster growth rather than melting is favored. In 
some crystalline ribbons or tubes, growth is initially in two dimensions and is disfa- 
vored because of unfavorable surface area/volume interactions, up to the point that 
the full width ribbon or tube has been formed. For these materials, a seed structure 
could allow immediate growth by providing a stable analogue to a full- width assembly. 
Protein microtubules [33] and DNA tubes [32l [36l [27] are believed to exhibit this type 
of nucleation barrier. 

In this paper we describe a tile set family, the zig-zag tile sets, for the control of 
nucleation during algorithmic self-assembly Zig-zag tiles can assemble immediately 
on a seed tile to grow potentially long ribbons of predefined width. In the absence of a 
seed tile, only full-width ribbons can continue to grow exclusively by favorable attach- 
ments. That is, there is a critical size barrier (based on unfavorable surface/ volume 
energy interactions) that prevents spurious nucleation. By redesigning the tile set 
it is possible to increase the width and therefore the critical size. We prove that in 
principle this method exponentially reduces the rate at which assemblies without a 
seed tile grow large (unseeded growth), while maintaining the rate of growth that 
starts from a seed tile an proceeds roughly according to the aTAM (seeded growth). 

Used as part of an error-reducing tile set transformation, the zig-zag tiles solve the 
aforementioned problem of controlling nucleation during algorithmic growth. With an 
appropriate seed, zig-zag ribbons can play the same role as the V-shaped boundary in 
Figure 11.2a . Since rule tiles are not likely to spuriously nucleate on their own under 
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Fig. 2.1: The zig-zag tile set. (a) The width 4 zig-zag tile set and seed tiles. Each 
shape represents a single tile. Tiles have matching bonds of strength 1 when the 
shapes on their edges match, (b) The ribbon structure formed by the zig-zag tile set. 
(c) The L-shaped seed nucleates linear assemblies, (d) The W-shaped seed tile, with 
appropriate tiles for vertical zig-zag growth, could nucleate V-shaped assemblies. 



optimal assembly conditions [51] . once this boundary has set up the correct initial 
information, algorithmic self-assembly will proceed with few spurious side products. 

In Section ® we describe the zig-zag tile set family in detail. In Section [31 we 
introduce a variant of the kTAM that is appropriate for the study of nuclcation. In 
Section [4] we analyze thermodynamic constraints on ribbon growth in our model. In 
Section [5j we prove our main theorem, that the rate of spurious nucleation decreases 
exponentially with the width of the zig-zag tile set. In contrast, the speed of seeded 
assembly decreases only linearly with width. Thus, for a given volume we can con- 
struct a tile set such that no spurious nucleation is expected to occur during assembly. 
This illustrates how the logical redesign of molecules can be qualitatively more effec- 
tive in preventing undesired nucleation than just controlling physical quantities such 
as temperature and monomer concentration. In Section [6j we use simulations to pro- 
vide numerical estimates of nucleation rates. These estimates suggest that reasonably 
sized zig-zag tile sets can be expected to be effective in the laboratory. 

2. The Zig-Zag Tile Set. A self-assembly program is a set of tiles that assem- 
bles into a desired shape or set of shapes. The zig-zag tile set (Figure l2~Tk of width 
k contains tiles that assemble into a periodic ribbon of width k fFigurc l2~Tb > ). Zig-zag 
tile sets of widths k > 2 can be constructed. A zig-zag tile set includes a top tile 
and a bottom tile, each having the same shape as 2 horizontally connected square 
tiles. Each of the k — 2 rows between the top and bottom tiles contains two unique 
middle tiles that alternate horizontally. The alternation of two tile types along the 
columns enforces the column- wise staggering of the top and bottom tiles. Each tile 
label has exactly one match on another tile type, so the tiles cannot assemble to form 
any other structures held together by sticky end bonds. 

The tile set is designed to operate in a physical regime where the attachment of 
a tile to another tile or assembly by two matching sides is energetically favorable, but 
an attachment by only one bond is energetically unfavorable. In the aTAM, these 
conditions translate to growth with a threshold of 2. Growth beginning from any 
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Fig. 2.2: Zig-zag tile set growth, (a) Seeded growth of a zig-zag tile set in the 
aTAM. The same growth pattern occurs reversibly in the kTAM when G mc = 2G se — e. 
(b) Unseeded growth. A possible series of steps by which the tiles could spuriously 
nucleate in the kTAM. 



non-seed tile in the zig-zag tile set goes nowhere in the aTAM — no two tiles can join 
by a bond with a strength of at least 2. In contrast, growth can proceed from an 
L-shaped or W-shaped seed tile (Figures 12.1b and I2.1H ). Figure l2~2"k illustrates the 
only possible growth path in the aTAM from the L-shaped seed. The staggering of 
the top and bottom tiles allows growth to continue indefinitely along a zig-zag path. 
Note that the top and bottom tiles alternately provide the only way to proceed to 
each successive column. Assemblies that do not span the full width (k tiles) either 
cannot bind top tiles or cannot bind bottom tiles, and thus cannot grow indefinitely. 
Growth from a seed tile of less than full width would stall. For example, with a seed 
tile of width k — 1 , the top tile could not attach by two bonds to the assembly. 

In the kTAM, seeded growth occurs in the same pattern as in the aTAM. Unlike 
in the aTAM, however, there are also series of reactions that can produce a full 
width assembly in the absence of a seed tile. The formation of such an assembly is 
called a spurious nucleation error. An example of such unseeded growth is shown 
in Figure l2"T2"b . Under the conditions of interest, some steps in spurious nucleation 
are energetically favorable, but at least k — 1 must be unfavorable before the full- 
width assembly is formed. Once the full-width assembly is formed, further growth is 
favorable. Spurious nucleation is a transition from assembly melting, where assemblies 
are more likely to fall apart than they are to get larger, to assembly growth, where each 
assembly step is energetically favorable. Any assembly where melting and growth are 
both energetically favorable is called a critical nucleus. 

Classical nucleation theory [561 [TTj predicts that the rate of nucleation is limited 
by the concentration of the most stable critical nucleus, [A c ]. Intuitively, because more 
unfavorable reactions are required to form critical nuclei in a wider zig-zag tile set, 
[A c ] should decrease exponentially with k. This argument is not rigorous, however, 
because the number of critical nuclei for a zig-zag tile set also increases with k. The 
rate of spurious nucleation is proportional to the sum of the concentrations of all these 
critical nuclei. We will show in the following sections that despite the increase in the 
number of kinds of critical nuclei that can form as k increases, under many conditions 
nucleation rates do decrease exponentially with k. 

3. The Self- Assembly Model. To analyze the process of tile assembly, we 
formally describe the mass-action kTAM. For a given tile set, kTAM describes the 
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set of possible assemblies, their reactions and the dynamics of these reactions. The 
kTAM has been previously used to analyze complex tile programs [381 ITT) , and is a 
general framework for understanding algorithmic self-assembly. Here, we extend the 
kTAM to include polyomino tiles. We also introduce a variant of the kTAM in which 
the concentrations of all possible assemblies arc considered. This is in contrast to 
the original kTAM, which tracks only a single, seeded assembly. Our extension is 
appropriate for studying nucleation, where growth can begin from any tile. Also in 
contrast to previous work with the kTAM, which used stochastic chemical kinetics, 
we introduce mass-action kinetics below. Both mass-action and stochastic kinetics 
are accepted models of chemical kinetics |12j , but mass-action is more tractable ana- 
lytically and the results of both models generally converge when a large populations 
of molecules are considered. In Section [51 we find that simulations of nucleation using 
stochastic kinetics are consistent with the bounds on nucleation rates we prove using 
the mass-action model. 

A tile type t consists of a shape and a set of bond types on each unit edge of 
the shape. The shape is either a unit square or a polyomino, a finite, connected set 
of unit squarctOJ The set of possible bond types is referred to as S. A set of tile 
types is denoted by T. A tile (as contrasted with a tile type) is a tuple of a tile type 
and a location, which is specified by L = (x,y), where L is the coordinate location 
of the leftmost top unit square within the polyomino. The set of tiles (all possible 
tile types in all possible locations) is referred to as T. A translation of a tile has the 
same tile type as the original. Tiles cannot be rotated. Tiles that abut vertically or 
horizontally are bound if they have the same labels on the abutting sides. A set of 
tiles is bound if there is a path of bound tiles between any two tiles in the set. 

An assembly A is an equivalence class with respect to translation of a non- 
overlapping, bound finite set of one or more tiles. The set of assemblies is denoted by 
A and the set of assemblies consisting of two or more tiles is denoted A 2 +- We will 
also use the notation for the set of tiles, T to refer to the assemblies that have only 
one tile, i.e. A — A 2 +- A set of tiles A is considered the canonical representation of 
A if A e A and 

V(t, (xi,yi)) e 1,1; > and yi > and 
3y,t' s.t. (t', (0,3/)) £ A and 
3x,t" s.t. (t",(.T,0)) e A 

That is to say, the canonical representation uses a coordinate system such that 
the reference locations of the tiles just fit into in the upper right quadrant of the plane 
with no negative coordinates. Note that polyomino tiles may extend into the other 
three quadrants, so long as the location of the leftmost top unit of each polyomino is 
in the first quadrant. For an assembly A, 

width(A) — max \y\ — y 2 \ + 1, such that (ti, L\), (ta, L<x) € A, 

X 

(as, j/i ) = Li, (x,y 2 ) = L 2 . 

This definition measures width with respect to the reference points for polyomino 
tiles, ignoring the extent of the other unit squares within the polyomino. length(A) , 

1 Here connected means that every unit square in the polyomino must have at least one side that 
abuts the side of another unit square in the polyomino. That is, the polyomino's component squares 
cannot be merely diagonally touching. 
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is defined analogously. Note that the definitions of length and width given here are 
designed to maximize the clarity of the analysis that follows, and may not be appro- 
priate for other analyses of tile assembly. The addition relation is defined between an 
assembly A G A 2 + and a tile t so that A + 1 = B if and only if A and t are bound but 
non-overlapping, and A U t is a member of equivalence class B. For the attachment 
of two tiles to each other, we need to be careful to correctly count the number of 
ways tiles can attach^. We consider the set of tile types T to be listed in some order. 
The addition relation is defined between two tiles t\ = (ti,Li) and ti = ^2,-^2) if 
either ti comes before t2 in the ordering of tiles or ti = t2 and for L\ = (x\, y\) and 
L 2 = {x2,y%)i either yi < y 2 or x\ < x 2 and y\ = y 2 and t\ {Jt 2 = A, for some A £ A. 
In this case, ti+t 2 = A. 

Bound tiles have a bond between them. The standard free energy, G° , of an 
assembly A is defined as G°(A) = —bG se , where b is the number of bonds in the 
assembly and G se (the sticky end energy) is the unitless free energy of a single bond. 

The dynamics in the kTAM consists of a set of reactions in which assemblies grow 
larger or smaller. 

In this paper, we consider all possible accretion reactions: reactions either between 
two tiles or between a tile and an assembly. We also assume that the number of 
available single tiles does not change during the course of assembly (i.e. the reaction 
is "powered" by some process or circumstance that keeps monomer concentrations 
constant). 

Formally, the set of powered accretion reactions are 

R={A + t^B + t, B ^ A : A, B e A 2+ , t G T, A+ t = B} U 

{h + t 2 -> A + ti + t 2 , A ->• : tt , t 2 G T, A G A 2+ , k + t 2 = A} 

The appearance of single tiles on both sides of the association reactions and 
neither side of the dissociation reactions reflects the powered model's assumptions 
that the number of single tiles remains constant. 

In the mass-action kTAM, the dynamics of an assembly process are governed by 
mass-action kinetics. Mass-action kinetics is based on an ideal situation where tiles 
and assemblies exist in infinite quantities, and move at random through a solution 
of infinitely large volume. To distinguish the relative abundance of tile types and 
assemblies in the system, we use the notion of concentration, which denotes the 
number of copies of the relevant tile type or assembly within a unit volume. The 
concentration of species A is denoted [A] . 

Tiles and assemblies form or are consumed because of reactions that happen 
spontaneously or as a result of collisions between the reactants. This leads to the 

2 This definition is crafted to correctly count the number of distinct ways in which tiles can attach 
to each other such that first, the system will satisfy detailed balance given the free energies assigned 
to tiles and assemblies at the end of this section, and second, the dynamics of tile interaction will be 
unchanged if tiles are given irrelevant markings — e.g., if a new tile, with the same binding labels as 
an existing tile, is added and the concentration of both new and old tiles are half that of the original, 
then in the new system the total concentration of both tile types will have the same dynamics as the 
original tile's concentration in the original system. The definition can be examined by considering 
the number of ways in which different tiles can attach to each other. Two tiles of the same type with 
the same label on all four sides can attach in exactly two distinct ways, two tiles of different type 
but with the same label on all four sides can attach in exactly eight ways, and two tiles of different 
types for which the left side of the first tile matches the right side of the second tile, but such that 
all other bonds are non-matching, can attach in exactly one way. 



9 



concentrations of assemblies changing over time. In a physical reaction vessel, an 
association reaction (a reaction where multiple species interact) occurs at a rate pro- 
portional to the frequency with which all the species involved come into physical 
contact. When the possible reactants are well- mixed and moving randomly through 
solution, the frequency with which such contact occurs is proportional to the prod- 
uct of the concentrations of all the reactant species. Likewise, dissociation reactions, 
which have only one reactant, occur randomly with constant probability per time 
unit per molecule. Even though individual reactions occur stochastically, when the 
number of particles is infinite, the total reaction rate is deterministic. 

These observations lead to mass-action kinetics, which is an idealized model of 
chemical reactions in a well- mixed vessel [12]. The proportionality constant that 
relates the product of the concentrations of the reactant species to the rate at which 
the reaction occurs is a rate constant. In general, for a chemical reaction ^\ riiSi — > 
^2jirijSj with rate constant k, where Si are chemical species and ni,rrij £ Z-° are 
the reactant and product stoichiometrics (the number of times the reactant or product 
species occurs), mass-action dynamics [12] predict ^j^- = kirrij — rij) Yli[^i\ ni - Mass 
action reactions occur in parallel, so that dynamics add linearly for multiple reactions. 

In the kTAM, each reaction has a forward rate constant kf that we assume to be 
the same for all reactions, and a backward rate constant k r = kje~ AG , where AG° is 
the difference between the sum of the unitless standard free energies of the reactants 
and that of the products (where the standard free energy of a single tile is 0). The 
concentration of all tile types is held at e _Gmc . (Identical concentrations are con- 
sidered for convenience only; Appendix [Al shows how our formalism can be extended 
trivially to treat reactions where species have different concentrations.) Assemblies 
consisting of more than a single tile have an initial concentration of 0. Thus, for an 
assembly A at time point s, 



d[A] - k f( E e G °W- G °W[B]-[A)e- 



ds 



A+t^B + t, 



J2 lB]e- G ^~e G °^- G °^lA} + £ e" 2G - - e G ° ^ [A] 



B+t^A+t, ti+ts-tA+ti+ts, 
A^B £fl A-S-0G.R 



(3.1) 



Each term in the first summation is the difference between the rate at which A and 
a tile react to form a larger assembly B and the rate at which the larger assembly B 
decomposes into A and a tile. Each term in the second summation is the difference 
between the rate of formation of A by a reaction where a single tile binds to a smaller 
assembly B, and the rate decomposition of A into assembly B and a single tile. The 
terms in the final summation are the rate of formation of A from two single tiles and 
its dissociation into two single tiles. These final terms are nonzero only if A is an 
assembly composed of exactly two tiles. In the remainder of this paper, we refer to 
the mass-action kTAM with powered accretion reactions as simply "the kTAM." 

The free energy G(A), (in contrast to the standard free energy G°(A), reflects both 
the entropy loss due to crystal formation and the enthalpy gain of assembly. For an as- 
sembly A with n tiles and b bonds, it is defined as G(A) = G°(A) + nG mc . The steady 
state concentration of an assembly A is given by [A] ss — e~ Gi -- A ^ — e C ,G «-' lG ra=) i Re- 
call that G mc > and G se > 0, so that the energetic penalty of adding an additional 
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(c) 

Fig. 4.1: Physical conditions where zig-zag polymer elongation is favorable. 

G mc (ln(tile concentration)) and G se (bond strength) define a set of physical condi- 
tions for zig-zag tile assembly, r = jf^. (a) Phase diagram of the width 4 zig-zag 
tile set. In phase A, above the line r = 2, no assembly reactions are favorable, 
whereas in regimes B.C.D.E and F, progressively more types of assemblies (shown in 

(b) ) become favorable, (b) The polymeric assemblies which become favorable in the 
regimes B-F shown in (a). Polymers shown for earlier regimes are also favorable in 
later phases: the polymer shown for regime B is favorable in regimes C-F and so on. 

(c) The assemblies that can form from a zig-zag tile set of width k and the physical 
conditions (in terms of r) in which these assemblies becomes favorable. 



. □□ 



□ □ 



tile can be compensated for by forming sufficiently many new bonds. A smaller G(A) 
is more favorable, and corresponds to a higher steady state concentration. 

This model satisfies detailed balance within A2+- That is, for all reaction pairs 
A —> B, and A + t —> B + t, kf[t][A] ss = k r [B] ss , where kf and k r are the forward and 
reverse rates in the respective reactions, and for reaction pairs t\ +t2 A and A — > 0, 
fc/[ii][^2] = fcr[^4]ss- A proof that the kTAM satisfies detailed balance is contained in 
Appendix [A] 

4. Thermodynamics of Zig-Zag Assemblies. To prove that nucleation rates 
of zig-zag ribbons decrease exponentially as their widths increase, we would first like 
to identify the critical nuclei for spurious nucleation. Thermodynamic constraints 
provide a powerful tool: Because undesirable assemblies have unfavorable energies, 
we can conclude that they occur rarely without having to consider rates. (In contrast, 
assemblies with favorable energies may or may not form quickly, depending upon 
details of the kinetics; such analyses form the bulk of Sections [5] and El) 

We therefore consider the free energy landscape, where each point in the landscape 
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Fig. 4.2: Zig-zag polymerization reactions. The addition of a polymer unit to 
a thin assembly consists of an initial unfavorable accretion reaction followed by a 
series of favorable accretion reactions, (a) A favorable polymerization reaction. The 
positive free energy change from the four favorable accretion reactions is larger than 
the negative energy change from the initial unfavorable accretion reaction. Thus, the 
elongation of polymers of width 3 is favorable when r = 1.75. (b) An unfavorable 
polymerization reaction. The positive free energy change from the two favorable 
accretion reactions is not large enough to compensate for the negative energy change 
from the initial unfavorable accretion reaction, so that the elongation of polymers of 
width 2 is unfavorable when r = 1.75. 



corresponds to a particular type of assembly. Optimal control over nuclcation is 
achieved in a regime where zig-zag growth is favorable, but the growth of less than 
full-width (thin) assemblies is unfavorable. 

Within the kTAM, the energy landscape for assemblies is formally described by 
the free energy G(A) = nG mc — bG se , which can be evaluated directly for any given 
assembly A. G se and G mc describe the physical conditions for assembly. Changing 
G se and G mc can bring the system into two qualitatively different phases. In the 
melted phase, G(A) is bounded below by G mc for all A, meaning that no assembly 
has a concentration of more than e~ Gma at steady state. In contrast, in the crystalline 
phase, G(A) can continue to decrease without bound (so [A] ss = e~ G ^ can increase 
without bound) as certain polymeric assemblies become longer and longer - that is, 
adding a repeat unit to the assembly strictly decreases its free energ}|j. Within the 
crystalline phase, there are regimes where the elongation of different types of polymers 
are favorable or unfavorable. To ensure that thin polymers do not tend to grow, it is 
enough to show that for each of these polymer types, longer polymers have a higher 
free energy than shorter ones. 

To characterize the energy landscape formally, we consider the important classes 
of polymeric assemblies and evaluate their free energies. Figure l4~Tb . B-F show the 6 
main types of polymeric assemblie^ for ribbons of width 4 by indicating the repeat 
group that may be added (by a series of accretion reactions, as shown in Figure W/I\i to 



3 In powered models, formal steady state concentrations can continually increase. This may seem 
nonphysical, but it is not problematic; it reflects the fact that providing unbounded materials can 
lead to an unbounded accumulation of product, and that longer polymers do not achieve steady state 
within the time during which the powered model is an appropriate model. 

4 "Imperfect" long assemblies, such as an assembly with more tiles in one column than another, 
can be considered as a member of the class corresponding to a more complete assembly of the same 
length and width. Since removing tiles from a "perfect" assembly strictly increases its free energy, 
these "imperfect" assemblies have strictly lower concentrations than their corresponding "perfect" 
assembly. 
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Fig. 4.3: Reactions that increase width, the AG for those reactions, and the 
resulting conditions (in terms of r) where those reactions are favorable. 



extend the polymer. To determine whether adding a repeat group results in a higher 
or lower energy assembly, we evaluate AG = G(A m+ i) — G(A m ) = AnG mc — AbG se 
where A m is a polymeric assembly with m repeat units. If AG is negative, then 
longer polymeric assemblies of this type are more favorable and we can expect this 
kind of assembly to grow at some rate. This gives a linear condition on G se and G mc , 
specifying a regime of physical conditions in which a certain class of long assembly is 
favorable. For example, for polymer type E, each repeat unit adds 4 tiles (An = 4) 
and 6 bonds (Ab = 6) , so these polymers grow if 4G mc — 6G se < 0, i.e. < §. 
Similar calculations result in the phase diagram shown in Figure 14.1b . which shows 
the melted phase A, in which no polymers are favorable, and the crystalline phase 
divided into regimes B-F wherein one additional type of polymer becomes favorable 

in each successive regime. In all these calculations, the ratio t = ^gp^ plays a critical 
role. 

Figure l4~Tb shows the 2k — 3 classes of polymeric assemblies for the width k zig- 
zag tile set (excluding the full width ribbons) along with the condition on r that 
determines when polymer elongation is favorable. Exclusively the elongation of full- 
width ribbons is favorable when 2 > r > 2(fc-2)+i = ^ — 2 fc-3 • That is, when 
2 > t > 2 — 2k-3 ' Zl S" za S growth is favorable, but the elongation of all less than full- 
width polymers is unfavorable. The regime where 2 > r > 2 — 2fc ^ 3 will be referred 
to as optimal nucleation control conditions. 

The table in Figure 14731 enumerates the assemblies for which growing wider (rather 
than longer) is favorable. Like the polymerization of thin ribbons, a reaction to pro- 
duce a wider assembly from a thinner one consists of an initial unfavorable accretion 
reaction followed by a series of favorable accretion reactions to complete the new row. 
The number of favorable reactions determines the values of r for which the overall 
reaction is favorable. Very long but thin assemblies can favorably grow wider even 
when r is close to 2, so for optimal nucleation control it is necessary that elongation 
of thin assemblies be unfavorable. Otherwise, a favorable path to nucleation exists: 
an assembly can grow longer until it is favorable for it to grow wider and then it can 
grow to full width. 

An example of the difference in the energy landscape between the regime where 
only the elongation of full width polymers is favorable (optimal nucleation control 
conditions), and a regime where the growth of thinner polymers is also favorable can be 
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seen in Figure |4~41 In each landscape, the critical nuclei divide the energy landscape 
into two basins whose lowest energy assemblies are infinite polymers or fully melted, 
respectively. A critical nucleus can, via a series of energetically favorable increases 
or decreases in length or width, either reach full width or melt away. The principal 
critical nucleus is the most stable critical nucleus. We start by considering the two 
landscapes under optimal nucleation control conditions, the two left landscapes of 
Figure 14.41 In these landscapes, the critical nuclei arc of width k — 1 (or width k) 
for both tile set widths and the most favorable path to nucleation for both tile sets 
is for a crystal of length 2 to grow to full width. Thus, the barrier to nucleation for 
a tile set of width 8 is higher than the barrier to nucleation for a tile set of width 
4. In contrast, when r = 1.77, the principal critical nucleus is the same for both tile 
sets: it is an assembly of width 3 and length 4. Under these conditions, the spurious 
nucleation rate of the tile set of width k = 8 will not be appreciably smaller than 
nucleation rate of the tile set of width k = 4. 

The primary theorem of the next section will apply only under optimal nucleation 
control conditions. While this region covers only a small fraction of area in the phase 
diagram shown in Figure 14.31 a slow anneal from a high temperature where t > 2 
to a temperature in which r < 1 will pass through this regime, and a slow enough 
anneal will allow the bulk of the reaction to take place in this regime. Therefore, 
it is reasonable to consider a mechanism for the control of nucleation which is valid 
only in this narrow range of physical conditions. In the next section, we analyze the 
nucleation rates of the zig-zag tile set within this regime. 

5. An Asymptotic Bound on Spurious Nucleation Rates. The kinetic Tile 
Assembly Model predicts the concentration of each assembly at all times. For most 
tile sets, the number of possible assemblies is large, and the individual concentrations 
of many kinds of intermediate assemblies are not necessarily of interest. However, the 
sheer number of possible assemblies and possible assembly pathways can significantly 
affect the overall rate of spurious nucleation, and they can not simply be ignored. 
Understanding the contribution of many different assembly types to the total spurious 
nucleation rate is the main technical challenge in what follows. It is often helpful to 
talk about the concentration of a class C C A of assemblies, [C] — J2aec\-A}- The 

derivative of the concentration of a class of assemblies, = Y^AeC 4i7^> can ^ c 
calculated as the difference between the rate at which at which assemblies join the 
class and that at which they leave the class. Reactions which produce new members 
of the class from assemblies not in the class arc the inward perimeter reactions, 
R m = {A + t -> B + t, A^t B, h+t 2 -4 B + h+t 2 : A<£C, B eC}. Reactions which 
use up members of the class to produce assemblies not in the class (or single tiles) 
are the outward perimeter reactions = R out = {B + 1 — > A + t, B — > A, B — > : 
A£C, Be C}. 

Define the flux across a set of reactions R at time s as 



F(R,s)= £ k f [A} s e- G -+ £ k f e G ^- G °^[B]s + 

A+t-+B+t£R B^AeR 

£ k f e-^+ £ k f e G °^[A] s (5.1) 

ti+t 2 ^A+t 1 +t 2 £R A-t0£R 

where [A] a is the value of [A] at time point s. Then ^P-(s) = F(R™,s) — 
F(R out ,s). 
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Fig. 4.4: Example energy landscapes. Coarse-grained depictions of the energy 
landscapes for two zig-zag tile sets of different widths under two different physical 
conditions. Each square in the grid represents a "perfect" assembly of the labeled 
width and length. The shading in the square corresponding to each width and length 
represents the energy of a rectangular assembly of those dimensions. Darker is more 
favorable. Contour lines group assemblies of similar energies. Large circles denote as- 
sembly sizes that are critical nuclei. The most favorable critical nucleus (the principal 
critical nucleus) is denoted by a large hollow circle. 



We will use these formalisms to bound the rate of spurious nucleation in a zig-zag 
tile set of width k. The spuriously nucleated assemblies for a zig-zag tile set of 



width k will be denoted Let the top tile in Figure 2.1(a) be designated tt, the 



bottom tile be tb, and the seed tile be designated t s . Formally, 

C k = {A e A : 3(x, y), (w, z) £ Z 2 s.t. A(x, y) = t t ,A(w, z) = t b , and 

V(?,r)£2 2 , A(q,r)^t s } (5.2) 

Note that the assemblies in Ck do not contain a seed tile, and we arc measuring the 
rate of formation of zig-zag ribbons without seed tiles. 

The inward perimeter reactions for [Ck], which we call the spurious nucleation 
reactions and denote by i?™, are the reactions for which the product is a full width 
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Fig. 5.1: Spurious nucleation reactions. Three spurious nucleation reactions for 
a zig-zag tile set of width 4. The reaction may be either favorable or unfavorable. 
In (b), the addition is favorable when G mc = 2G se — e for small e, because two new 
bonds are formed; in (a) and (c), the addition is unfavorable, because in each reaction 
only one new bond is formed. 



assembly, but the reactant is not. In other words, they are the addition reactions 
which produce width k assemblies from assemblies of width k — 1 by adding either 
a top or bottom double tile (Figure I5.1|) . As shown in Section 21 under optimal 
nucleation control conditions these reactions demarcate the point at which sustained 
growth can proceed by exclusively favorable steps. The outward perimeter reactions, 
which we call the ribbon shrinking reactions and denote by i?£ Mt , are those in 
which a tile falls off a full width assembly to produce an assembly of width k — 1. For 
assemblies that have suffered a ribbon shrinking reaction, there is also a downhill path 
to complete melting in an energy landscape of the type shown in Figure 14.41 under 
optimal nucleation control conditions. 

The overall rate of spurious nucleation of width k zig-zag crystals (in units of 
molar per second), 

n k (s) = ^-(s) = F(KT, S )~F(Rr t ,s), 
as 

may be integrated over time to obtain the total concentration of spuriously nucleated 
assemblies. Furthermore, an upper bound on rifc(s) similarly translates into an upper 
bound on the concentration of spuriously nucleated assemblies. Because the growth 
path for full- width ribbons is so favorable (zig-zag growth), one such bound is ob- 
tained by neglecting the ribbon shrinking reactions and considering just the spurious 
nucleation reactions: 

nUs) = F(Rf,s)>n k (s). 

In what follows, "the rate of spurious nucleation" refers to rik(s), the rate at which 
the concentration of spuriously nucleated assemblies increases. We distinguish this 
rate from n^(s), the rate of formation of all full- width assemblies, whether they later 
shrink or not, by referring to the latter as "the rate of spurious nucleation reactions 
(or events)." 

Theorem 5.1. For a zig-zag tile set of width k > 2, if 2 > ^ff^ > 2 — 5, S < 5^3) 
and G se > i^u-^s > ^ lRn f or a ^ ^ mes s > n k( s ) < 4fc/e( 5 ~ fc ) Gsc . 

Proof. Since n^(s) < rik(s), we can prove the theorem by showing that n~^(s) < 
spurious nucleation reactions are addition reactions, so if we 
compute n^(s) using Equation l5.il the second and fourth terms of the expression are 
both zero. Spuriously nucleated assemblies arc defined as assemblies of width k, so 
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Fig. 5.2: Assembly dimensions of rectangular assemblies, (a) A k — 1 = 3 by 

I = 8 assembly, (b) A k — 1 = 3 by Z = 7 assembly. 



the reactants in the spurious nucleation reactions arc of width k — 1 (only accretion 
reactions are allowed). For a tile set of width k > 2, the third term of Equation 
I5.U — the contribution of the interaction of two tiles — also drops out. Therefore, for a 
tile set of width k > 2 with spurious nucleation reactions 

n k (s) < n+(s) = Yl k f [A]e~ G ™, (5.3) 

where [A] is the concentration of assembly .4 at time point s. 

While it is in general difficult to calculate [A] at an arbitrary time point, the 
following lemma shows that the concentration of an assembly can be bounded by its 
concentration at steady state, which is easy to compute: 

Lemma 5.2. In a mass-action powered accretion kTAM, if in the initial state only 
single tiles have a positive concentration, then every assembly has a concentration less 
than or equal to its steady state concentration at all time point^. 

Proof. See Appendix [Bj □ 

Lemma 15.21 implies that 

F(RT lS )< k f [A] ss e- G - 

A+t^B+teRl" 

where [A] ss is the concentration of assembly A at steady state. 

Partitioning the summation according to the length of the reactant assembly gives 

oo 

F(Rp,s)<J2 E k f [A]s S e- G -. (5.4) 

1=1 length(A)=l 

A+t^B+teRi n 

To be a reactant in a spurious nucleation reaction, A must have a width of k— 1. 
Because all bonds in an zig-zag tile set assembly are unique to a given location within 
the ribbon repeat unit, any potential tile addition either matches the assembly on all 
sides, such that no errors occur, or matches on no sides, such that the addition does 
not produce a bound assembly. Thus, A cannot have any mismatches. Each assembly 
A in the preceding summation can therefore be viewed as a k — 1 by I rectangular 

5 The concentration of the class which at the conditions we consider contains an infinite 
number of assemblies, is actually infinite at steady state. The inward flux, as we will show, is finite 
because the concentration of unnucleated assemblies stays finite at steady state, even though there 
are also an infinite number of unnucleated assemblies. 



17 



assembly of one of the types shown in Figure 15.21 with zero or more tiles missingj. 
2G se > G mc by assumption, so the free energy of a k — 1 by I assembly cannot be 
more favorable than the free energy of the k — 1 by I rectangle that contains it, since 
any missing tiles in the rectangle could be added by favorable reactions. Therefore, 
the concentration of any k — 1 by I assembly at steady state must be no larger than 
the concentration of its corresponding k — 1 by I rectangular assembly. Note that this 
bound is very loose, since most assembly types have several tiles attached by only one 
bond and therefore have a higher free energy. Let A^-i^i be a k — 1 by I rectangular 
assembly, and C(k — 1,1) be the number of assemblies of width k — 1 and length I. 
Each assembly can bind a single tile in up to 1/2 locations (since the tile must be a 
double tile) along either the top or bottom edge. Thus, 



oo 

fW,s)<E E k f [A k - hl }sse- G ™ (5.5) 

1=1 length(A)=l 

A+t^-B+teRiT 

oo , 

< E C(k - 1, V-kflA^jUe- ™. (5.6) 

i=i 

A counting argument shows that C(k — 1,1) < 2( fc ~ 1 ' i+1 , so 



F(R{ n , S ) <J2^ k ~ 1)l lk f [A k ^ u ] ss e- 



(5.7) 



i=i 



The steady state concentration of an unseeded assembly with n tiles and b bonds 



is given by [A], 



-nG mc +bG s 



The assembly A k ^i j contains (fc — 2)1 small tiles 



and \l/2~\ top (or bottom) tiles. There are (I — l)(k — 2) horizontal bonds between 
small tiles and \l/2~\ — 1 horizontal bonds between large tiles. In addition, there arc 
up to / vertical bonds in each of the k — 2 spaces between rows of tiles. Therefore, 

[A k -i,i] ss < exp (- ((k - 2)1 + 1/2) G mc + ((k - 2){l - 1) + 1/2 + (k - 2)1) G se ) . 

Applying the assumption G mc > (2 — 8)G se and simplifying, 

1 3<5, 



[A k _n} ss <exp 



(2 - k)G se + (kS 



■)lG se 



Thus, 



F(R k n ,s) < k f e- G -e^ G ^ 



OO 

£ i2 (k-i)l e (fcj-i- 



-)IG S 



Since kS 



35 



< when k > 2 and S < 



fc _g, bounding G se from below 
preserves the inequality. Therefore, when G se > 1 1^2k-3)s > 



"It could also be a subset of a rectangular assembly with top instead of bottom tiles, but the 
free energy of both kinds of assemblies is the same. To account for this, we include a 2 pre-factor in 
the number of assemblies corresponding to a k — 1 by I rectangle, thereby counting both assemblies 
with top tiles and assemblies with bottom tiles. 
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F(R k n , s) < k f e- G -e^ G - >T ^-^e^-^T 2 ^^ (5.8) 

1 = 1 

OO 

= kfe -G m c e (2-k)G ae J2l2^ l e- klla2 (5.9) 
i=i 

oo 

= fc /e - G -e^ fc ) G "^Z2-' (5.10) 

= 2k f e- G -e^ G - (5.11) 
= 2k f e {5 - k)G ". (5.12) 

(5.13) 

□ 

This theorem says that the spurious nucleation rate, rife, decreases exponentially 
with k and with G se , within the limits of applicability of the theorem — which requires 
larger G mc for larger k, and hence slower growth rates. The strength of the theorem, 
therefore, lies in the extent to which spurious nucleation decreases faster than the 
growth rate, rfe, of seeded crystals. These relative rates translate into the degree of 
purity that can be obtained when attempting to grow seeded crystals: suppose the 
concentration of seeds is c, and they are grown to a length L during a time period 
s = L /ru ■ The concentration of unseeded crystals that will have spuriously nucleated 
in that time is less than s-n^ = L-^-, i.e., the fraction of crystals that were spuriously 
nucleated is less than — • — . (When we use nk without specifying a particular time, we 
mean its steady state value, which is an upper bound.) Regardless of what length or 
amount of seeded crystals is desired, reducing ^ is the relevant metric for increasing 
the yield of desired structures. 

One way to study the trade-off between rik and rk is to ask, given a target growth 
rate r, what is the lowest nucleation rate that can be achieved by adjusting G mc and 
G se while maintaining = r? Previous work |51j has shown that near the r = 2 phase 
boundary that is relevant to our theorem, the growth rate is closely approximated by 

measured in layers per second. The lowest nucleation rate for a given target growth 
rate r is then n* k (r) = min G sc ,G mc ?ifc- A plot of n* k (r) vs r, if it could be calculated, 

S.t. r k =r 

would reveal how much the spurious nucleation rate decreases when the growth rate 
is decreased. Theorem 15.11 only gives us an upper bound on n k (r), but even so, this 
already gives us a characterization of the advantage provided by wider zig-zag crystals. 

Specifically, choosing 2G se — G mc = e = Ink, 5 = ^ ( 2 k-z j ano - G se > 4fclnfc, 
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then Equation 15.121 guarantees that 



n* k <n k < 2k f e~ Gm "e i2 ' k)Gs " 



2k f e e e 
2k f e 
2kfke 



2k f e lnk e- kG ° 

kG ss 



Define nf = 2k s ke- kG °" . Also 



fe JL( e -G mc _ e -2G sc) = J^ {e e-2G SC _ e -2G«) = kf£ -2G 3 



k - r fc — i 

The ratio ^ describes the trade-off between assembly speed r k and spurious 



nucleation n k - This ratio can be no larger than For k > 2 and the chosen 

parameters, 



nf _ 2 k f ke kGs - = 2fce (2-fc)G se < 2 fc e (2-fc)4fcinfc_ 

-2G se 



r fe fc/e" 

which decreases exponentially with fc. Thus, under these conditions, seeded zig-zag 
crystals can be grown with exponentially greater yield as width increases. 

The bound n k where e = 0.1 is plotted against r k in Figure [O] for k = 3, 4, 5 and 
6. While these bounds characterizing the trade off between n* k and r k are rigorous, 
because Theorem 15.11 is so loose, it is expected that nj£ is actually much lower than 
the bound nf? . In the following sections, we will see that this is true; furthermore, 
the true slopes are even steeper than obtained by Theorem 15. II 



6. Numerical Estimates of Spurious Nucleation Rates. Having proven in 
the previous section that zig-zag tile sets can be designed to achieve arbitrarily low 
spurious nucleation rates relative to the growth rates (using a loose upper bound), we 
now ask whether the nucleation barrier provided by zig-zag tile sets is sufficient for 
practical implementation in the laboratory (which requires more accurate quantitative 
assessments). There are two main concerns: first, as each tile must be synthesized, 
k must be small (6 is currently practical, while 50 is currently too large); second, 
assembly time must not be too long. (Growing 1000 layers of seeded crystals with less 
than 1% spurious nucleation — which we refer to as the "typical reaction" — seems like 
a reasonable goal to accomplish within one week.) Because the analytic bounds of 
Section [5] are too loose to allow us to obtain a realistic evaluation of nucleation rates, 
we now develop more accurate numerical calculations and stochastic simulations for 
estimating spurious nucleation rates. 

The analysis in Section [5] overestimates the spurious nucleation rate in three 
ways. First, it overestimates the concentration of almost all kinds of assemblies by 
assuming they have the same concentration as a rectangular assembly of the same 
length and width, and it overcounts the number of different types of assemblies. 
Second, Lemma 15.21 shows that the spurious nucleation rate at steady state is the 
maximal spurious nucleation rate. However, it may take longer to approach steady 
state than the time needed to run a "typical reaction," and far from steady state, the 
spurious nucleation rate may be much smaller than the spurious nucleation rate at 
steady state. Lastly, this analysis defines a spurious nucleation event for a zig-zag tile 
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Fig. 6.1: Analytical upper bounds, steady-state calculations and asymptotic 
simulations of spurious nucleation rates. The graph compares the growth rate 
(in layers/s) and the rate of spurious nucleation events, (in M/s), for 2G se —G mc = 
e = 0.1. k f = 6 x 10 5 /M/s and for all points G se > ^^k-^s for k = 3 ' 4 ' 5 and 6 - 
Analytical upper bounds on the nucleation rate (rij? = 4fc/e^ _fc - )Gsc ) are those given 
by Theorem 15. II The method of numerical calculation of spurious nucleation reaction 
rates at steady state, njj", is described in Section IBTT1 Stochastic simulation methods 
(giving njf ) are described in Section [6.21 



set of width k as a reaction that produces an assembly of width k, and neglects the 
backward reaction. In practice, many reactions that form an assembly of width k are 
unfavorable, so that the product assembly frequently shrinks back to a sub-critical size 
instead of growing larger. When conditions only slightly favor growth, even assemblies 
containing several layers have a reasonable chance of shrinking to nothing before they 
grow substantially. We expect rife < < in this case. 

While it is not possible to compute the nucleation rate exactly, in this section 
we describe three numerical techniques that correct each inaccuracy described above 
for zig-zag tile sets of widths k = 3, 4, 5 and 6. In Section 16-11 we compute the 
rate at which ribbons of width k are formed at steady state using a much more 
accurate count of the number and steady state concentration of assemblies.. These 
computations show that the analytic bound of Theorem 15.11 is too high by at least 
4 orders of magnitude for the range of parameters studied. In Section 16.21 we use 
a stochastic simulation of tile assembly to estimate the rate of spurious nucleation 
reactions iil n . Our results indicate that spurious nucleation reactions occur during a 
"typical reaction" at a rate that is no more than an order of magnitude lower than 
the rate at steady state computed in Section HTT1 In Section [5751 we use the stochastic 
simulation to investigate whether the rate of spurious nucleation reactions (n't = 
F(R% 1 , s)) in a typical reaction accurately predicts the rate at which large assemblies 
appear (which at steady state is equivalent to rtfc(s) = F(i2| n ,s) — F(R out ,s)). We 
find that for the range of parameters studied, at least 99% of assemblies that reach 
full width will melt before growing into large crystals, and thus our other estimates 
of spurious nucleation rates may be overestimates of rife by at least two orders of 
magnitude. In Section 16.41 we show that these results together indicate that a zig- 
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Fig. 6.2: Estimates of nucleation rates from stochastic simulations, nf and 

njj° vs. rk for k = 3, 4, 5 and 6. e = 0.1. The line n = r is also plotted to illustrate 
values of n if there were no improvement in nucleation rates with assembly slowdown. 



zag tile set of width 5 or 6 should be large enough to prevent almost all spurious 
nucleation in a "typical reaction" , while maintaining reasonable assembly speeds. We 
conclude with an important caveat to these results. Our results are derived under 
a powered accretion model of kTAM, while in experiments, small assemblies may 
aggregate rather than growing exclusively by single tile additions, thus potentially 
producing nuclei that reach a critical size more quickly than our simulations indicate. 

6.1. Spurious Nucleation Rates at Steady State. Recall that for a zig-zag 
tile set of width k > 2, the steady state rate of spurious nucleation reactions is given 
by the sum 

oo 

n+ = ]imF(B?,s) = £ £ k f [A] ss e~ G - , 

'=1 A+t-tB+tzRj? 

s.t. length(A)=l 

which ignores the rate at which spuriously nucleated assemblies dissolve back into 
pre-nucleated assemblies. While [A] ss is known (at steady state, for an assembly A 
with n tiles and b bonds, [A] ss = e bGse_ ™ Gm<! ), ^ j g no ^. p rac tical to compute the 
sum exactly because there are an infinite number of spurious nucleation reactions. 
Additionally, it can be impractical to evaluate the inner sum even for a single value 
of I: no efficient algorithm is known for exactly enumerating the reactions in iii™ (see 
e.g. [19] for the related problem of counting polyominos). The number of distinct 
reactions increases exponentially with the length of A, so it is prohibitive to calculate 
all but the first terms of the sum. 

Despite these difficulties, the expression can be calculated precisely, with known 
error bounds, for many k. The following lemma shows that under many reaction 
conditions of interest, the sum converges quickly, and its value can be approximated 
by summing only the first few terms: 

Lemma 6.1. When G se > (In 10) (A - 2) + In 4, G mc = 2G se - e, < e < 
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□ □ 

□ □ 

Fig. 6.3: Hypothesized principal critical nucleus for most spurious nucle- 
ation reactions. The rate of spurious nucleation reactions by this assembly (shown 
in successively lighter shades of gray for tile sets of widths 3, 4, 5 and 6) accounts for 
a large portion of spurious nucleation at slow speeds, and also accounts for the rate 
of increase in spurious nucleation rates as assembly gets faster. 



, k > 2 and I is even, 



E 

p=i+i 



E 

ys. t. length(A)—p 



kf[A]s 



,-G„ 



< 2 



E M4< 



A+t^B+tGR]! 1 
s.t. length{A)—l 



Proof. See Appendix [Cl □ 

Thus, to calculate the spurious nucleation rate up an accuracy of -, it is only 
necessary to compute the inner sums of the scries until the sum the current value of I 
is (even and) less than ^ . (Note that this approach does not directly yield a proof of 
an analytic bound for arbitrary k, because the formula for the nucleation rate is not 
a closed form expression.) 

We have used this series truncation method to calculate the rate of spurious 
nucleation to 1 part in 10 4 for k = 3, 4, 5 and 6 and for a range of G se , G mc 
for which e = 0.1. The values of G se , G mc and k used were in a regime in which 
Lemma 16 . 1 1 applies . The results are shown in Figure IB~T1 

In addition to the numerical calculations providing lower estimates, the slopes 
of log n\ vs log rk in Figure 16.11 are larger than those of log nf vs log rt ■ Specif- 
ically the numerical calculations give slopes compared to the analytic bounds 
that give slopes |. Is this reasonable? In the limit as G mc — > oo, all spurious nu- 
cleation should be dominated by the single species with the highest steady state 
concentration (adding tiles become so unfavorable that other species can be ne- 
glected.) The analysis in Section |4] suggests that this assembly is the one shown 
in Figure 16.31 The steady state concentration of this assembly A for a tile set of 

Width k is [A] ss = e -(2fc-3)G mc + (3fc-6)G sc = e -fcG sc + (2fc-3) £ ) where £ = 2 Q ^ _ 

If all forward nucleation reactions involve A, then 

n+ a 2k f [A] ss e- G ™ = 2 k f e-^ G ^ 2k ^ (6.1) 
while the speed of growth is 

rk = i ( e -G mc _ e - 2Gse) = k -2G sc e £ - 1 



k - 1 v ' 1 k — 1 ' 
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Tile set 


< 




nf 


width 


3 


3x 10 24 years 


1 x 10 23 years 


9 x 10 5 years 


4 


2 x 10 8 years 


3 x 10 e years 


7 years 


5 


900 years 


10 years 


20 days 


6 


2 years 


20 days 


20 hours 



Table 6.1: Estimated time needed to grow lOOO layers in 1 pmol of free tiles 
without any spurious nucleation, based on approximations of the spurious 
nucleation rate. While the total number of tiles in each reaction is constant, the 
volume differs depending on the speed, which determines the concentration and the 
spurious nucleation rate. 



Tile set 


< 




nf 


width 






3 


1 x 10 11 years 


lxl0 lu years 


200 days 


4 


40 years 


5 years 


12 hours 


5 


10 days 


2 days 


1 hour 


6 


7 hours 


2 hours 


15 minutes 



Table 6.2: Estimated time needed to grow 1000 layers such that less than 1 
percent of assemblies are spuriously nucleated based on approximations of 
the spurious nucleation rate. The seed concentration is the concentration of 
free tiles e _Gmc . 



and thus the slope would be ^j 2 -, as observed. The rough estimate of given in 
Equation 16.11 is within a factor of three of the value calculated to an accuracy of 1 
part in 10 4 , as guaranteed by Lemma |6. II 

6.2. Stochastic simulations for estimating forward nucleation rates be- 
fore steady state is achieved. In order to determine whether the steady state 
approximation is accurate over a typical spurious nucleation reaction, we simulated 
zig-zag tile assembly for tile sets of widths k = 3, 4, 5 and 6 and measured the 
rates of spurious nucleation events during the time it should take to grow 1000 layers 
from seeds. Since there are an infinite number of powered accretion reactions, exact 
simulation of growth under the kTAM using mass action dynamics is not possible. 
Instead, we simulated assembly growth using stochastic chemical reaction dynamics 
for discrete numbers of molecular assemblies. Here, simulation is possible because 
even though the probability of any of the infinite number of species arising is larger 
than 0, the total number species tracked at a given time is finite. To approximate 
the nucleation rate, we simulated a tiny reaction volume, and used these results to 
predict the nucleation rate in a much larger volume. 

We used the Gillespie algorithm Q~8] to sample the trajectories of stochastic dy- 
namics of the zig-zag tiles in a small volume V, whose value is chosen to ensure the 
accuracy of our nucleation rate estimate as described below. Following the powered 
model, our simulation assumes the concentration of each tile type to be constant and 
explicitly tracks each assembly in the volume containing more than one tile. Initially, 
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no multi-tile assemblies are present. Single tiles are present at a concentration of 
e~ Gmc , so the rate of two tiles colliding (and thus producing a new assembly to be 
explicitly tracked) is AkfVe~ 2Gmc molecules / second where A is Avogadro's num- 
ber. For each assembly containing two or more tiles, the rate of tile addition at each 
available site is kfe~ Gmc and the rate at which a tile with b bonds falls off an assembly 
is k f e~ bGsc . 

For k = 3, 4, 5 and 6 and a range of G se and G rnc where e = 0.1, we counted 
the number of spurious nucleation events, m, that took place over the time course of 
a "typical reaction", s = lOOO/r^, in a volume V that was chosen large enough to 
ensure that statistical error in to is less than 10 percent of its value {P > 0.95^0- If our 
simulations yield a nucleation rate of to events per second, the molar rate of nucleation 
events for a bulk volume is given by nf f» . The results of the simulation — which 
were possible only for small enough G se such that nucleation events were frequent 
enough to be counted — are shown in Figure 16.21 For k = 3 and k = 4, these rates 
are within a factor of 2 of the linear extrapolation of the curves from Figure 16.11 and 
for k = 5 and k = 6 these rates are within a factor of 10, indicating that the choice 
in Section [5] to bound nucleation rates based on steady state concentrations did not 
affect our estimate of nucleation rates too greatly. This should be expected, given 
that under the conditions we studied most steady state nucleation appears to involve 
assemblies like the one shown in Figure 16.31 

6.3. Nucleation of Long Ribbons. In this paper, we have defined a spurious 
nucleation reaction for a zig-zag tile set of width k as a reaction in which an assembly 
of width k— 1 grows to width k. The goal was that this definition would be inclusive, 
such that all long ribbons would undergo at least one spurious nucleation reaction, 
but not too loose, such that most spurious nucleation reactions lead to a long ribbon. 
However, many of these spurious nucleation reactions are not energetically favorable — 
an assembly may briefly reach width k before a tile falls off. The assembly then cither 
melts or undergoes another spurious nucleation reaction. 

At what rate do long ribbons appear? Using the stochastic simulation described 
in the last section and the same range of physical reaction parameters, we measured 
to', the number of ribbons containing 50 tiles or more that were present at the end 
of a "typical reaction", for the widths 3, 4, 5, or 6. to' is an estimate of the number 
of spurious nucleation events that did not subsequently melt, and thus it provides 
the basis for an estimate for rife. As only those crystals that nucleated sufficiently far 
before the end of the simulation will have grown to a large enough size to have been 

counted, we use the formula n|° d = jjzzj^or^fh^TjYjVA ' wriCrc s i s the time of the 
simulation and 50r/c/(fc — 1) is the approximate time to grow to 50 tiles via zig-zag 
growth. The results are shown in Figure 16.21 

These simulations suggest that much of the looseness of the analytical bound on 
nucleation, , is caused by our neglecting crystals which under go a nucleation reac- 
tion and subsequently melt. These simulations suggest that at least 99% of crystals 
that undergo a spurious nucleation do not grow into crystals consisting of 50 or more 
tiles. That is, iik <C njj". 

6.4. Expected Effectiveness in Practice. Do these results indicate that nu- 
cleation control with tile sets of width 6 or less are good enough? Recall that our 
"reasonable goal for a typical reaction" addresses how much time is needed to grow 

7 That is, twice the standard deviation of the number of nucleation events per simulations is less 
than ten percent of the average number of nucleation events per simulation. 
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seeded ribbons of 1000 layers with less than 1% of the crystals being spuriously nu- 
cleated. The fraction of crystals that are spuriously nucleated is given by / = 77^-, 
where L is the number of layers to be grown on seeds, and c is the concentration of 
seeds. While the simulations only measured for large values of ru, it is possible 
to approximate for smaller values of by assuming that the graph of log(rifc) vs. 
log(rfc) continues as a line with constant slope as rjt and rik decrease. We consider two 
cases. The first situation (more stringent than "reasonable") is to grow many ribbons 
— say, from 1 picomole of each tile type, making 6 x 10 8 ribbons of length 1000 - 
and not have more than a single spuriously nucleated crystal, i.e., / < 1.67 x 10 -9 
and c = e _Gmc /1000. To satisfy this constraint, we express / in terms of c using 
our estimates for nk, and solve for c (and hence the concentration of free tiles and 
the volume of the reaction by Ae~ Gmc V = 10~ 12 moles). The time needed to grow 
the crystals is therefore given as L/r^ for these conditions. The results, shown in 
Table 16.11 suggest that if n|° is accurate, then this stringent goal could be met using 
a width 6 zig-zag tile set and a day of growth. The looser estimates and smaller tile 
set widths are less encouraging. The second situation we consider is the "reasonable" 
one; again, c = e~ Gmc /1000, but now we only require / < 0.01. The results are shown 
in Table 16.21 In this case, acceptable growth fidelity is predicted to be achieved in 
less than an hour for width 6, and for only slightly longer times for widths 5 and 4. 
However, all these estimates are very sensitive to the coefficients of the linear fits to 
log rifc vs logrfe, which are imperfect because the relationship is not perfectly linear. 

The analysis and simulations in this section support the idea that nucleation con- 
trol using the zig-zag tile set not only works in theory, but should be practical. While 
in most respects our models appear complete, two effects which may be important 
in the actual process of assembly are not included. One such effect is tile depletion: 
while our model considers the concentration of free tiles to be constant, in a typical 
experiment tiles are used up because they join assemblies. Since the rate of spurious 
nucleation is concentration dependent, we would expect the rate of spurious nucle- 
ation to be larger at the beginning of a reaction, when almost all free tiles remain, 
than at the end, when many tiles are used up. Because of this effect, our simulations 
may actually overestimate the spurious nucleation rate in experimental systems. 

However, our simulations also neglect an important possible reaction pathway 
that may greatly increase the rate of spurious nucleation. While our model assumes 
tiles must be added to assemblies one at a time, in an experiment, small assemblies can 
also attach to each other. The formation and joining of several small assemblies may 
be faster than the spurious nucleation pathways described in this paper. A complete 
understanding of spurious nucleation of zig-zag tiles requires an understanding of the 
speed of spurious nucleation reactions caused by the joining of small assemblies. 

7. Conclusions. 

7.1. Nucleation of Algorithmic Self- Assembly. Our original motivation was 
to show that self-assembly programs that work in the aTAM, in which it is straight- 
forward to design tile sets that algorithmically assemble any computationally defined 
structure, can also be made to work in the more realistic kTAM. 

While tile sets that assemble correctly via unseeded growth in the aTAM with a 
threshold of r = 1 will assemble correctly in the kTAM under the right conditions, 
programs to assemble structures can be exponentially larger (in terms of number of 
tile types) than those with a threshold of r = 2 [38]. However, tile sets that are 
designed to assemble via seeded growth in the aTAM with a threshold r = 2 may fail 
in the kTAM because mismatch, facet and spurious nucleation errors occur. These 

26 



growth 



3; 1 e 1 f ip 1 p tp 5 



3 F ^P IP IP 1 P 1 P 3 



vth 



□□□□□□□□□□□□□□□□ 
□□□□□□□□□□□□□□□□ 
rzz] fzz3 pzzi rzzi rzzi rzzi rzz ezz ( 



fragmentation 



□□□□□□□□□□□□□□□□an \ 
□□□□□□□□□□□□□□□□□c / 
□ □□□□□□□□ fragmentation 



□ □□□ 

*i p 3 E 5 



□□□□□□□□ 
□□□□□□□□ 
P TP 3 1! ^ 3 



□□□□□□ 
□□□□□□ 



Fig. 7.1: Exponential amplification of assemblies. Probe strands assemble 
onto a target sequence to create a seed assembly, which nucleates zig-zag growth. 
Periodic fluid shear causes fragmentation of zig-zag assemblies, leading to exponential 
amplification. The diagonal structure of the seed assembly shown here is a natural 
shape for assembling tiles on a scaffold strand [37]. 



problems are ameliorated in the limit of slow assembly speed [FT]. Other work has 
described methods to control mismatch errors and facet errors without significant 
slowdown [52] [7] [34] . Here, we have developed a construction that may be used 
to correct the last discrepancy, spurious nucleation errors, again without significant 
slowdown. 

It remains to be formally proven that these constructions can be combined to 
control all types of errors simultaneously for any tile set of interest. No major dif- 
ficulties are expected, however, in large part because mismatch and facet errors can 
both be controlled by a single mechanism [7] and the control of spurious nucleation 
errors works independently of this mechanism. Both methods work by transforming 
an original tile set which works in the aTAM at r = 2 into a new (typically larger) 
tile set that is more robust to particular kinds of errors in the kTAM. 

After this paper was submitted, experimental demonstration of a decrease in 
nucleation rates with ribbon width have supported the predictions made here |42| . 
Further experiments combining the techniques described here with proofreading tech- 
niques, as predicted, resulted in algorithmic assembly where both mismatch error 
rates and spurious nucleation error rates are low [4], and has enabled other algoith- 
mic self-assembly experiments |16j . It remains to be seen whether facet nucleation 
rates can be lowered in experimental demonstrations of algorithmic self-assembly, but 
the principal mechanism in the theoretical proposal for lowering facet nucleation error 
rates [7] has been has been experimentally confirmed [8]. 

7.2. Detection of a Single DNA Molecule. Control over nucleation in algo- 
rithmic self-assembly can be seen as a special case of the detection of a single molecule. 
For a tile set of sufficiently large width, essentially nothing happens when no seed 
tiles are present, whereas if even a single seed tile is added, growth by self-assembly 
will result in a macroscopic assembly. Theorem 1 shows that the false-positive rate 
for detection can be made arbitrarily small by design; the false-negative rate in the 
kTAM is approximately 0. Although this idealized model does not consider many 
factors that could lead to poorer detection in a real system, we don't know of any 
insurmountable problems with implementing single-molecule detection this way. So 

27 



far, experiments have shown that seeded growth can be much faster than unseeded 
growth, even when seeds are present at much lower concentrations than the elements 
of the zig-zag tile set [HI 2] , and no lower limit for detection with current technology 
has been established. 

There are, however, two immediate drawbacks. First, detecting seed-tile assem- 
blies is not as useful as detecting arbitrary DNA sequences. Second, the linear growth 
of a single zig-zag assembly would require a long time lapse before a macroscopic 
change is perceptible. As sketched in Figure 17. 1[ both obstacles appear surmount- 
able. First, as in [2H1 E3] , a set of strands can be designed to assemble double-crossover 
molecules on a (sufficiently long) target strand with nearly arbitrary sequence, thus 
creating the seed assembly if and only if the target strand exists. Second, since 
fluid shear forces can fragment large DNA assemblies [20] , intermittent application of 
these forces could break large zig-zag assemblies, increasing the number of growing 
ends with each fragmentation episode. This fragmentation process can be expected 
to lead to exponential growth in the number of zig-zag assemblies without increasing 
the false-positive rate. (When a spuriously nucleated assembly does eventually form, 
of course, it will also be exponentially amplified.) 

Based on the analyses of the previous section, we can estimate the effectiveness of 
this procedure. Is there a reasonable tile set width for which a single seed could amplify 
to a level of detectability in a reasonably short time without any spurious nucleation 
occurring within the given volume? Specifically, given a 10 yuL reaction volume, a 
minimum detection level of 10 5 crystals and a protocol in which assemblies split after 
growing on average to size 200 layers, we would like to determine the minimum time 
and tile set width that meet these requirements. Creating 10 5 crystals requires first 
growing from the seed to size 200, then 17 cycles of fragmentation followed by growing 
100 additional layers (50 on each side), so amplification requires t a = 1050/rfc seconds. 
The expected time for the first nucleation event is tf = 7lk y A , and our criteria for 
reliable detection is tf > 100t a , i.e., ^ < 1050 o y A ■ Based on the estimates 
described in Section 16.21 we use the approximation log(nfc) + 2 = ^^(log(rfc) — 
1.7). Solving for t a as a function of k, we find that good results are obtained for 
experimentally feasible widths. For example, with k = 12, reliable detection of a 
single seed in V = 10 /iL (i.e. 0.13 attomolar concentration) is t a « 26 hours. 

7.3. Exponential Replication of Inheritable Information. The zig-zag 
constructions detailed in this paper propagate a single bit of information: the pres- 
ence or absence of the seed tile. Using a tile set that simply copies information, we 
could use the exponential amplification reaction to detect and identify one of several 
different target strands, by creating a tile set where the seed assemblies for each target 
strand contain a different pattern of Is and 0s. 

Furthermore, considering the amplification process as replication, the information 
encoded in the strip's width can be seen as a form of inheritable information [41] . re- 
lated to Graham Cairns-Smith's proposal for information replication within clays [5J 
[BJ. A zig-zag assembly replicates (in the appropriate culture medium consisting of 
tiles) by growth of new layers followed by random fission |14j . Errors during growth, 
bit flips, as well as errors that increase or decrease the width of the assembly, are inher- 
ited. If one sequence of tiles has a greater reproductive fitness than other sequences — 
for example, by having a different growth or fission rate — then natural evolution can 
be expected to occur. In principle, the right selective pressures on such a process 
could induce the formation of arbitrarily complex crystal genotypes |43| . 
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Appendix A. Mass-action kTAM satisfies Detailed Balance. 

This section contains proof that the mass-action kTAM used in this paper satisfies 
detailed balance within A 2 +- We prove two facts necessary to show this. 

The proof also applies to the case, not considered in this paper, where different 
tile types have different (but constant) concentrations. For a tile t, S(t) is defined as 
the relative concentration of its corresponding tile type. Unit concentration is e _Gmc , 
such that the concentration of tile type t, [t] = 5(t)e _Gmc . Additionally, while only 
equal strength sticky ends are considered in this work, this proof shows that detailed 
balance applies to a model of self-assembly with arbitrary sticky end strengths. 

Lemma A.l. For all reaction pairs A + t, ->• B + t and B — > A, kf[t][A] ss = 
k r [B] ss , where kf and k r are the rates of the respective reactions. 

Proof. 

k r [B] ss = k f e G °W- G °W[B} ss 
= k f e G ° iB) - G ° {A) e- G(B) 

= kfe GO{B) - G ° { - A) er^ G ° {B)+ ^'^ Gmo-\n(S(t')))) 
= Jfc / e- G0 ( A )e-^*'eB( G -«- ln (fl f (f))) i 

Because A + 1 = B, 

= fc /e ~ G ° (A) e- E*' 6 A(Gmc-MS(t'))) e -G m<! +ln(S(t)) 

= kfe -G(A) e -G me +HS(t)) 
= k f [A]„[t]. 

□ 

Lemma A. 2. For reaction pairs t\ + t 2 — > A and A — » ; fc/[ti][£ 2 ] = k r [A] ss . 
Proof. 

k r [A\ ss = k f e G °^[A} ss 
= k f e G °^e- G ^ 

= fc /e G°(A) e -(G°(A)+2G mc -ln(S(t 1 ))-ln(S(t 2 ))) 
= fc /e - G -c+ln(S(ti)) e -G mc +ln(S(t 2 )) 
= kf[t 1 ][t 2 }. 

□ 

Appendix B. Steady State Concentration as a Bound on Assembly 
Concentration in a Powered Accretion Self- Assembly Model. This section 
contains the proof of Lemma l5.2l In a mass-action powered accretion kTAM, if in the 
initial state only single tiles have a positive concentration, then every assembly has a 
concentration less than or equal to its steady state concentration at all time points. 

Suppose that this lemma is not true. Then there is a time at which the con- 
centrations of one or more assemblies exceed their values at steady state. Since the 
concentrations of all assemblies are zero initially, there must be a first time point s 
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at which for at least one assembly A, [A] = [A] ss . At this time point, the concen- 
trations of all other assemblies are either at or below their respective steady state 
concentrations. The rate of change of [A] is given by Equation l3.lt 



ds "*> 



Y, e G °^- G °^[B]-[A]e- G - + 



A+t^B + t, 
B^A £R 



[B]e -G mc _ e G° { A)-G° { B )[A] + £ e~ 2G - - e G °W[A] 



E 

B+t-yA+t, t 1 +t 2 ^>A+t 1 +t 2 , 
A^B £fl A— >0Gi?, 



Consider a single term in the second summation, [B}e~ Gmc — e G°{A)-G°(B) 
involving some assembly B. We know that [A] has reached its steady state concentra- 
tion, so [A] = e~ G ( A \ By assumption, [B] < [B] ss = e~ G ( B \ Assembly A includes 
one more tile, t, than docs assembly B, so G°(A) - G°(B) = G{A) - G(B) - G mc . 
Therefore, 



[ B ]e- G ™ - e G °^- G °^[A] = [B]e~ G - - e G{A)-G{B)-G me[A] 

= [B]e" Gm " - e G(A)-G(B)-G mC( ,-G(A) 



= [B]e- G - - e- G ^e- G ' 
= e- G - ([B] - e- G ^ 
< 0. 

Similarly, for an assembly B that is a term in the first summation, B has the 
extra tile t so that G°(B) - G°(A) = G(B) - G(A) - G mc . The term can be simplified 
to 



e G°(B)-G°(A) [jB] _ [A]e -G mc = e G{B)-G(A)-G mc[B] _ e -G(A) e -G mc 

< e G(B)~G(A)-G mCe -G(B) & - G {A) e ~G mc 

= 0. 

The terms in the third summation are also non-positive, since 

e -2G mc _ e G"(A) [A] = e -2G mc _ e G°{A) e -G(A) 

= e -2G mc _ e G(A)-2G mC( ,-G(A) 

= 0. 

The change in concentration ^Sp-(s) is composed entirely of terms of this form. Since 
each of these terms is non-positive, ^j^(s) is non-positive when [A] = [A] ss . Thus, 
[A] can never rise above its steady state value. 

As in Appendix \K\ this proof also applies to a model of self-assembly with arbi- 
trary stoichiometry and sticky end strengths. 

Appendix C. Fast Convergence of Nucleation Rates at Steady State. 

This section contains a proof of Lemma |6. II for zig-zag tile sets of width k. 
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We start by re-writing the lemma to use convenient notation to refer to the inner 
sums within the series for nf, which refer to the rate of spurious nucleation events 
involving assemblies A of width k — 1 and length 

N p = k f [A] ss e~ G ^, 

s.t. length{A)—p 

such that = Ya^i Ni- Now, Lemma \6 . 1 1 may be stated as: 

When G se > (In 10) (fc - 2) + In 4, G mc = 2G se - e, < e < 5^31 * > 2 arl ^ ' * s ewe7l > 
tten EZi+i n p< 2N i- 



To prove this lemma, we will prove two sub-lemmas. First, 

Lemma G.l. If G se > (ln4)(fc - 2) + ha^p G mc = 2G se - e, I is even and 

< e < 2^3, then N l+1 < \Ni. 

Proof. We will partition the assemblies of length I + 1 into classes corresponding 
to assemblies of length I. We will then show the total spurious nucleation rate of 
reactions containing the assemblies in each class is at least twice as small as the 
spurious nucleation rate of reactions containing its corresponding assembly. The class 
of assemblies of length I + 1 corresponding to an assembly B of length I will be denoted 
B, 

To assign the assemblies to classes, we introduce a procedure that takes an as- 
sembly A of width k— 1 and length I + 1, and then "condenses" its right end to yield 
an assembly B with width k — 1 and length I. Specifically, A and B are identical 
except for the last two columns of A and the last column of B and if A had a tile in 
either the ultimate or penultimate column in some particular row, then B will have 
a tile in its last column in the same row. Recall that for valid zig-zag assemblies, if a 
tile is present in a particular spot, its tile type is determined by its neighbors - thus, 
we don't have to specify tile types in our condensation procedure, since there is no 
choice. Formally, we say that B = condensation(A) if 

V0 < a < k - 1, < b < I - 1 : A(a, b) = B(a, 6), and 

V0 < a < k - 1 : B(a, I - 1) = iff A(a, l-l) = A{a, I) = 0. 

Recall that A, the canonical representation of A, begins indexing sites at 0, so the 
first column has index and the last (/ + 1 st ) column has index I. Also note that since 

1 is even, A cannot have a double tile extending into its last column, so no double 
tiles are condensed. 

To see that for every assembly A, condensation(A) is bound, note first that 
A is an assembly, so it is bound. Furthermore, the connectivity graph of B = 
condensation(A) (with a vertex for each tile and an edge for each abutting pair) is 
just a graph-theoretic contraction of the connectivity graph of A that combines any 
two vertices in the same row of the last two columns of A (then possibly adding some 
extra edges). Therefore, B remains bound. Thus, each A of width I + 1 is assigned 
to a unique, valid assembly B of width I. 

Condensation is many-to-one, so there are many assemblies A that condense onto 
the same smaller assembly B. We assign A to the class corresponding to the assembly 
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condensation(A), i.e., the class 



B = {A : condensation(A) = B} . 

For a given assembly B of length I, the elements of B, all of length l+l, can be created 
by adding p tiles (1 < p < k — 2) to the I + 1 st column of B, and then removing h 
tiles (0 < h < p — 1) from the ^ th column. 

Imagine making these changes one at a time, say from top to bottom, in each row 
either moving or adding a tile. For each of the p — h tiles that are added to the I + 1 st 
column where the corresponding tiles in the ^ th column are not removed, p—h tiles 
are added to the assembly and no more than 2{p — h) — 1 bonds may be formed. For 
the h tiles that are moved from the I th to the I + 1 st column, no tiles are added, and 
no more bonds can be created (some might even be lost). Therefore, for each such 
assembly A, 

[A\ ss <e-^ G -e^- h ^ G -[B] ss . 



Let I a be the number of spurious nucleation reactions that an assembly A is a 
reactant of. The rate of spurious nucleation events involving assemblies of length / + 1 
is therefore given by: 

;,.r/ii „-G mc 



Ni+i = > k t [A] 



e 



A,ceA 
s.t. A+t^c+teiVk 

length(A)=l + l 

We now partition this sum by summing over all smaller assemblies B, and then for 
each A e B (recall B = {A s.t. condensation(A) = B}) we count the spurious 
nucleation reactions: 



Yl Y. k f [A] ss e 

, SG w„n , AGB,CeA 
s.t. length{B)=l ^ A+t ^ c+t £ R ™ 



< ^2 lAkf[A] ss e~ 



G„ 



A.BeA 
s.t .condensation(A)— B 

length(B)=l 



Partitioning B according to the number of tiles added and moved, and using our 
inequality for [^4] ss in terms of [B] ss , we have: 

* E E (" I 2 ) E ( P 7 ') lAkAB] ss e-^ G ^e^-^e-^ 

BeA p=l ^ P ' h=0 ^ ^ 

s.t. length(B)—l 
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Under the conditions of the lemma, G mc > 2G se — 2 k-3 so that 
< E E C ~ 2 ) E ( P I 1 )ufc/[B]..e- a ^- fc ) G «e«^e^- fc )- 1 ) G "e- 

B s.t. i 
length(B)=l 



P 

B s.t. p=l v ^ ' ft=0 



fc-2 /, _v p-1 



B s.t. p=l v 7 fr=0 

length(B) — I 



E ^/[^EO^^E^ 1 

B s.t. p=l V y ' h=0 V 



— n s~i 

e 2k ~ 3 e " 



length(B) — l 



Noting that the inner sums are binomial expansions of (e.g. (1 + x) n = X)"=o il) x *) 
or portions thereof, we can simplify further: 



2 



B s.t. 

length(B)=l 



lAk f [B] ss J2[ )e^e- G -(l + e^r- 1 e- G 



Since for k > 2, § < (1 + e^^)" 1 < |, 



Bs.t. 5 p= i \ p y 

length(B)—l 

< E ^/[5],,E( fc ~ 2 )e^2 Pe -^e-^ 

Bs.t. b p= i \ p y 

length(B)—l 



B s.t. 

length(B)—l 

Similarly, for k > 2, (1 + 2e 2fe - 3 ) < 4, and < Is + 1 since the longer assembly A 
can have at most one more spurious nucleation reaction than B, so 

I' 



< J2 + l)k f [B] ss e^ k -Ve- G -e- G » 



B s.t. 

length(B)—l 

< \lBk } [B]sse l ^ k - 2) e- G -e- G ^ 

B s.t. 

length(B)—l 



When G se > ln(4)(fc - 2) + ln(f ), 



< E \ l Bkf[B] s 



e -a„ 

B s.t. 

length(B)=l 



\ E 

A+t^B + teR^ 
s.t. length(A)=l 



,e- G - = W 
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□ 

The above sub-lemma takes care of the smaller odd terms, but to show that the 
entire summation is bounded, we show that the smaller even terms are also bounded. 

Lemma C.2. IfG se > ln(10)(fc - 2) + ln(4), G mc > (2G se - k > 2 and I 

is even, then N1+2 < 

Proof. The proof for this sub-lemma is similar to that for Lemma IC.ll except 
that the condensation function is defined so that the presence of a double tile in the 
/ + 1 st and I + 2 nd columns is taken into account. 

Here, we use a procedure that takes an assembly A of width k — 1 and length I + 2, 
and then condenses its right end to yield an assembly B with width k — 1 and length 
I. Again, A and B are identical except for the rightmost three columns of A and the 
last column of B and if A has a tile in any of the last three columns in some particular 
row, then B will have a tile in its last column in the same row. An added detail is 
that we must now consider that the rightmost two columns of A may contain a double 
tile; in this case, the rightmost two columns of B must have a double tile also. The 
double tile may either be on the top or on the bottom; without loss of generality, we 
assume it is on the bottom, since the other case can be treated identically Again, the 
tile types of the new tiles in B are determined by their neighbors. Formally, we say 
that B = condensation' (A) if 

VO < a < k - 1, < b < I - 1, (a, b) ^ (k - 2, 1 - 2) : A(a, b) = B(a, b), and 
VO < a < k - 1 : B(a, I - 1) = iff A(a, I - 1) = A{a, I) = A(a, I + 1) = 0, and 
B(k - 2, 1 - 2) = iff A(k - 2, 1 - 2) = A(k - 2, 1) = 0. 



The proof that every assembly A has a bound condensation' is virtually identical 
to the proof in the previous lemma. The rest of the proof is also similar, except that 
different numbers of tiles may be removed from the I + 1 st and I + 2 nd columns. 

For a given assembly A, creating A from B, where condensation' (A) = B, 
requires adding p tiles, 1 < p < 2k — 3, to the (7 + l) st and (I + 2) nd columns of B, 
and then removing h tiles, 1 < h < k — 1, from the I th column. 

For each of the p — h tiles that are added to the (I + l) st column and (I + 2) nd 
columns where the corresponding tiles in the Z th column are not removed, p — h tiles 
added to the assembly and no more than 2(p — h) — 1 bonds may be formed. For the 
h tiles that are moved from the I th to the (I + l) st column or (/ + 2) nd , no tiles are 
added, and no more bonds can be created. 

Thus, the spurious nucleation rate of these assemblies is given by: 

Ni+2 = k f [A] ss e~ G ^ 

s.t. length{A)=l+2 

< l A kf[A} ss e- Gm " 

A,B 

s.t. condensation (A)—B 
length(B)=l 

< E ~ 3 ) E ^ ~ h 2 ) lAk f [B} ss e- G ^e-^ G -e^- h ^ G -. 

length(B)—l 
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When G mc > 2G se — 2fc 1 _ 3 , this similarly reduces to 

< J2 lAk f [B] ss er G -e- G -(l + e^) k - 2 (l + e^) 2k - 3 

B s.t. 

length{B)=l 

< Y, l A k f [B} ss e- G -e- G -((l + e^)(l + e^) 2 y~ 2 . 

B s.t. 

length(B)—l 

For k > 2, (1 + e^3)(l + e^>) 2 < 10, and thus 

< J2 lAk f [B] ss e- G ^e- G ^10 k - 2 . 

B s.t. 

length{B)=l 

Therefore, when G se > ln(10)(fc - 2) + ln(4), and recalling that Ia < Ib + L 

< E \{l B + l)k f [B] ss e- G - 

B s.t. 

length(B)=l 

< E \lBk s [B] ss e~ G - 

B s.t. 

length{B)=l 

= \ E M4a.e- G ™ = l -N h 

A+t^B+teRl" 
s.t. length(A)=l 

□ 

Now, we can combine Lemma IC.ll and Lemma IC.2I to derive Lemma 16.11 If I is 
even, 

oo 

N p = N l+1 + N l+2 + Ni +3 + N l+4 + ... 

p=i+i 

< \Ni + \Ni + IjV, + IjV, + . . . 

< 2Nl 
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